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Abstract 

In the recent years, globalization prepared a ground for English to be the lingua franca of the academia. Thus, 
most highly prestigious international journals have defined their medium of publications as English. Flowever, 
even advanced language learners have difficulties in writing their research articles due to the lack of appropriate 
lexical knowledge and discourse conventions of academia. Considering the fact that the underuse, overuse and 
misuse of formulaic sequences or lexical bundles are often characterized with non-native writers of English, 
lexical bundle studies have recently been on the top of the agenda of corpus studies. Although the related 
literature has represented specific genres or disciplines, no study has scrutinized lexical bundles in the research 
articles that are written in the educational sciences. Therefore, the current study compared the structural and 
functional characteristics of the lexical-bundle use in LI and L2 research articles in English. The results revealed 
the deviation of the usages of lexical bundles by the non-native speakers of English from the native speaker 
norms. Furthermore, the results indicated the overuse of clausal or verb-phrase based lexical bundles in the 
research articles of Turkish scholars while their native counterparts used noun and prepositional phrase-based 
lexical bundles more than clausal bundles. 

Keywords: lexical bundles, corpus analysis, comparative analysis, research articles 

1. Introduction 

In recent years, English has become the dominant language of the academia as highly prestigious international 
journals tend to define their medium of publications as English. It can be said that English has become the lingua 
franca of the academia (Swales, 2004; Hyland, 2009), and the dominance of English as the “Tyrannosaurus rex 
of the linguistic grazing ground” (Swales, 1997: 376) has initiated the debate of how this will affect the 
professional lives of international researchers, instructors, and students. Some critical scholars have discussed 
this spread of English in the academia under the titles of monolingualism, linguistic hegemony or imperialism, 
cultural power, or homogenization (Pennycook, 1994; Phillipson, 2008; Tsuda, 1994; Uysal, 2014) while the 
others have pointed out the neutrality and the benefits of using English, such as economic gains (Bmtt-Griffler, 
2002; Spolsky, 2004) and international communication (Wright, 2004). Still, the number of the articles published 
by the researchers whose first language is not English has been increasing progressively (Hyland, 2006); thus, 
issues surrounding international scholars trying to publish in English to exist in global academia needs further 
attention. 

Despite the critical view against the English spread, rowing against the current spread of English in academia 
does not seem to help scholars and students. English language has already established its own academic 
discourse which “constructs the social roles and relationships which create academics and students and which 
sustain the universities, the disciplines, and the creation of knowledge itself’ (Hyland, 2009, p. 1). For that 
reason, non-native students and scholars have to learn the grammar, vocabulary and discourse conventions of 
English to have a voice in academia. Among these, particularly learning and appropriately using the rich 
vocabulary of English is not as easy as learning the rule-based grammar of it. Therefore, considering the strong 
relationship between vocabulary and writing (Coxhead, 1998; Schmitt, 2010), the researchers have been striving 
to create academic word lists since 1950s to help non-native students and researchers get familiar with the 
academic and technical vocabulary. For example, Coxhead (1998, 2000) published two academic word lists 


176 




www.ccsenet.org/elt 


English Language Teaching 


Vol. 9, No. 6; 2016 


successively. The latest advances in computer technology also led to linguistic research through large corpora, 
and Biber, Johansson, Leech, Conrad, Finegan and Quirk (1999) carried out a large-scale study to find the 
recurrent sequences named as lexical bundles in academic prose and conversation. Biber et al.’s (1999) study 
showed how lexical bundles are ubiquitous in academic prose, and the subsequent studies (e.g. Biber, 2009, 
Biber, Conrad, & Cortes, 2004; Chen & Baker, 2010; Durrant, 2015; Hyland, 2008a; Pan, Reppen, & Biber, 
2016; Perez-Llantada, 2014) underlined the variations of lexical bundles across registers, genres and disciplines. 
These studies on lexical bundles have often represented either a specific genre, such as student essays or a 
discipline, generally chemistry, history and biology; yet, no study has scrutinized lexical bundles in the genre of 
research articles that are written in the discipline of educational sciences. Therefore, it is the purpose of this 
study to compare the lexical bundles in academic writing across two languages (English and Turkish) through 
the structural and functional taxonomies, and to explore if the use of lexical bundles by the non-native speakers 
of English deviates from the native speaker norms. Thus, the results are expected to give insights to writing 
instructors, non-native postgraduate students and the scholars in the field of education. 

1.1 Review of Literature 

Considering the quote of Firth (1957, p. 195) that “[y]ou shall know a word by the company it keeps,” it can be 
understood that the research on formulaicity dates back to 20 th century, but the proliferation of the computer 
technologies which in turn commenced the development of corpus linguistics precipitated the studies that use 
large corpora. First, some large corpora such as British National Corpus (Leech, 1992) and Corpus of 
Contemporary American English (Davies, 2009) started to serve for researchers, teachers and students to see 
how language is used in real contexts. Then, some multicultural corpora such as International Corpus of Learner 
English (Granger, 2003) emerged to examine and explore the usages in the learner language. In addition to these 
reference corpora, researchers compiled specialized written or spoken corpora for the sake of their own research 
purposes. All of these studies which aim to compare and contrast the usages to aid the non-native learners of 
English opened a new era with the study of Biber et al. (1999) on lexical bundles. 

Although some other studies can be found on the recurrent sequences in different names such as lexical clusters 
(Hyland, 2008b), n-grams (Stubbs, 2007) and formulaic sequences (Wray, 2000), the study of Biber et al. (1999: 
590) was the first one which argues the lexical bundles within the current definition in which they have been 
defined as “recurrent expressions, regardless of their idiomaticity, and regardless of their structural status”. 

For a frequency-driven approach to determine the recurrent sequences or lexical bundles, some different criteria 
have been defined in several studies. To decide on lexical bundles, some cut-off points were determined 
changing from 10 to 40 occurrences per million words depending on the size and the mode (spoken or written) of 
a corpus. Another criterion was the dispersion of the lexical bundles in at least five different texts or %10 of the 
text in a corpus to prevent idiosyncratic usages. However, it was observed that corpus studies produced long lists 
of lexical bundles which do not mean much for the learners of English. Therefore, the need of some alternative 
formulas emerged as a matter of inquiry in the language teaching field. As a solution, first Biber et al. (1999) 
suggested a structural taxonomy, and then Biber, Conrad and Cortes (2004) and Hyland (2008a) suggested 
categorizing the lexical bundles according to their discourse functions. In the following years, Ellis, 
Simpson-Vlach and Maynard (2008) and Simpson-Vlach and Ellis (2010, p. 488) studied on the educational and 
psychological validation of lexical bundles through their “formula teaching worth,” and they revealed that 
frequency and association measures (e.g. mutual information score) should be integrated to determine the lexical 
bundles which are pedagogically relevant for learners. 

Considering the significance of lexical bundles in academic writing for native- and non-native speakers (Schmitt, 
2005), it is of importance to determine and categorize the “recurrent discourse building blocks” in functional 
categories. In this way, it would be possible to establish a beneficial list for pedagogical purposes. Such a list 
will not only help the corpus linguistics to find a place in classrooms or academic writing courses (Romer, 2010), 
but also it will help us overcome the problems the critics draw our attention with regards to the lack of the 
theories that facilitate lexical bundles to be accessible in the classrooms (Granger, 2015). 

Moreover, the underuse, overuse and misuse of formulaic sequences or lexical bundles are often characterized 
with non-native writers of English. Hence, such flaws may cause problems while non-native researchers are 
trying to publish in prestigious journals in the educational field (Bestgen & Granger, 2014). Thus, scholars 
highly emphasized the importance of competence in using lexical bundles in academic writing (Cortes, 2004, 
2008; Durrant & Mathews-Aydinli, 2011). Due to its being “the most researched length for writing studies” 
(Chen & Baker, 2010, p. 32) and the existence of “a wider variety of structures and functions to analyze” (Cortes, 
2004, p. 401), four-word lexical bundles were selected as the scope of the current study. Also, four-word bundles 
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subsume three-word bundles in their structure and ten times more frequent than five-word bundles (Cortes, 2004; 
Perez-Llantada, 2014). Considering these suggestions, the current study aims to compare the structural and 
functional characteristics of the lexical-bundle use in LI and L2 research articles, and to determine any 
divergence from the native norms with a comparison of the native- and non-native usage of lexical bundles. The 
results of the study are expected to help the researchers who try to publish in the field of education and the 
postgraduate students who submit their written assignments in English. 

1) Are there any structural differences between the use of 4-word lexical bundles by native and non-native 
speakers of English? 

2) Are there any functional differences between the use of 4-word lexical bundles by native and non-native 
speakers of English? 

3) Which lexical bundles are shared by native and non-native speakers of English? Which lexical bundles are 
distinctive to native speakers of English? 

2. Method 

The current study adopts the corpus linguistics as a methodology which in fact is “concerned primarily with the 
description and explanation of the nature, structure and use of language and languages and with particular 
matters such as language acquisition, variation and change” (Kennedy, 2014: 8). The corpus studies can be 
divided into two as corpus-driven and corpus-based studies. As the current study did not adjust its scope to a 
predefined category of the lexical bundles, the study is a corpus-driven one which can be defined as: 

“a holistic approach to language in that the cumulative effect of repeated instances is taken to reflect the semiotic 
system; the text is seen as an integral part of its verbal context and, ultimately, no discontinuity is assumed 
between this and the wider context of situation, and the even wider context of culture” (Tognini-Bonelli, 2001: 
87). 

Based on a corpus-driven analysis, the current study aims to determine the shared and distinct uses of lexical 
bundles in the research article corpus in Educational Sciences. 

2.1 Data (Corpus) 

A specialized corpus was designed to answer the research questions of the current study. Therefore, research 
articles in the field of educational sciences were collected from peer-reviewed journals considering the three 
criteria, namely topic, text type and author profile, as suggested by Salazar (2014). In other words, the corpus 
was comprised of the English research articles that were written by native English scholars in their LI and 
Turkish scholars in their L2 in the field of educational sciences. The size of the corpus was determined as 
one-million, and the each sub-corpora (LI English and L2 English) included roughly 500.000 words similar to 
Pan, Reppen and Biber (2016). 

After compiling the corpus, the pdf files were converted to the text files for the analysis. Also, the extra 
information in the research articles such as tables, author names, interview quotes, figures and page numbers 
were deleted not to confound the results of the analysis. The corpus statistics were given in Table 1. As can be 
seen in Table 1, a significant difference was found in terms of the length of the articles. When this difference was 
examined, Turkish scholars were observed to rely merely on tables and figures without providing detailed 
interpretation of the findings in the results section. 


Table 1. Corpus statistics 



LI English 

L2 English 

Tokens (Running words) 

500.327 

500.012 

The number of articles 

79 

101 

Types (Distinct words) 

17.832 

17.059 

Type/token ratio 

3,65 

3,55 

STTR 

37,99 

33,29 

STTR std. dev. 

61,07 

66,34 

Sentences 

17.052 

17.796 

Mean in words 

28,64 

27,00 

Standard deviation 

88,87 

98,70 
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According to these statistics, the length of the sub-corpora seems to be similar to each other. However, the 
lengths of the research articles are different due to the tendency of Turkish scholars to present their results only 
with tables and figures. This also might be related to the rhetorical conventions and academic writing culture in 
Turkish language. The rest of the statistics show no significant difference in terms of distinct words, type/token 
ratio, sentences and mean in words. After establishing the two balanced sub-corpora, the next step was to 
determine the cut-off criteria in order to identify the lexical bundles in the corpus. 

2.2 Procedure 

WordSmith 6.0 software (Scott, 2016) was used to identify the four-word lexical bundles in the corpus. The 
four-word lexical bundles were the most frequently studied strings (Chen & Baker, 2010) as the three-word 
bundles are the shorter versions of the four-word bundles (Cortes, 2004) The previous studies also adopted 
difference frequency and dispersion criteria for their studies while analyzing lexical bundles. For instance, Biber 
et al. (1999) set the cut-off criteria as 10 times per million words in at least five texts while Pan, Reppen and 
Biber (2016) called the clusters which occur 40 times per million words in at least five texts as lexical bundles. 
The current study adopts Hyland’s cut-off criteria as an average one between Biber et al. (1999) and Pan, Reppen 
and Biber (2016). According to Hyland (2008a, 2008b), a four-word lexical bundle should occur 20 times per 
million words in at least 10% of the texts. Then, context- and content-dependent bundles such as in the United 
States and Ministry of National Education and the overlapping bundles (the purpose of the vs. purpose of the 
study) were excluded from the bundles list. After retrieving the four-word bundles following these criteria, we 
categorized these bundles structurally and functionally through the structural taxonomy of Biber et al. (1999) and 
the functional taxonomy of Hyland (2008a). As the initiator of the lexical bundle research within the current 
definition, Biber et al.’s taxonomy has still been used in the structural classification of bundles. Although the 
functional taxonomy was developed by many researchers (Biber, Conrad, & Cortes, 2004; Hyland, 2008a), the 
current study makes use of Salazar’s (2014) taxonomy which is a developed version of Hyland’s (2008a) 
taxonomy as it was developed by reflecting the concerns of research writing. 

In the structural taxonomy of Biber et al. (1999), lexical bundles were mainly analyzed under three main 
categories: noun phrase based bundles, prepositional phrase based bundles, and verb phrase based bundles. In 
some studies, these bundles have been categorized as clausal or verb-phrase based lexical bundles. In his 
functional taxonomy, Hyland (2008a) examined the discourse functions of the bundles in three categories: 
research-oriented bundles, text-oriented bundles, and participant-oriented bundles. Research-oriented bundles 
help writers to structure their activities and experiences of the real world with the subcategories of location, 
procedure, quantification, description, and topic. Text-oriented bundles are concerned with the organization of 
the text and its meaning as a message or argument, and this organization is carried out through transition signals, 
resultative signals, structuring signals, and framing signals. Participant-oriented bundles focus on the writer and 
the reader of the text with the help of stance and engagement features. The results of the structural and functional 
analyses were presented in the following section. 

3. Results and Discussion 

The analysis on the lexical bundles written by native English scholars produced 32 four-word lexical bundles. 
The most frequently used four-word lexical bundles in native writing were the end of the, at the end of the extent 
to which, in the context of and it is important to. However, the number of the four-word lexical bundles was 
inflated largely in L2 English research articles, and the analysis produced 98 four-word lexical bundles. The 
most frequent lexical bundles in non-native corpus were on the other hand, as a result of the results of the, it 
was found that and at the end of. Although some of the previous studies argue that non-native speakers of 
English use fewer (Erman, 2009; Howarth, 1998) and less various (Granger, 1998) lexical bundles, the current 
study contradicts them. Yet, the findings corroborate with some other studies (Hyland, 2008b; Ozttirk, 2004; 
Perez-Llantada, 2014; Pan, Reppen, & Biber, 2016) which revealed that non-native speakers use a broad range of 
lexical bundles. For example, in the Turkish setting, Bal (2010) carried out a study on the English lexical bundles 
written by the Turkish scholars and found 99 lexical bundles in a one-million word corpus which is quite similar 
to the results of the current study. In another study in the Turkish setting, the number of the lexical bundles 
written by Turkish postgraduates was also approximately two times higher than the research articles of native 
speakers and the MA/PhD dissertations of native postgraduate students (Oztiirk, 2014). Thus, in the current study, 
the number of the four-word bundles in L2 English texts written by the Turkish scholars indicates frequent use of 
formulaicity and fixedness in the register of academic written language (Perez-Llantada, 2014) and that the L2 
academic prose also consists of a large number of lexical bundles as suggested by Greaves and Warren (2010) 
and Biber et al. (1999). In the following sections, the retrieved bundles were subjected to the structural and 
functional analyses, and the results were presented comparatively. 
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3.1 Comparison of Structural Types of Lexical Bundles 

As can be seen in Table 2, L1 and L2 scholars wrote their research articles through different grammatical types 
of lexical bundles. The grammatical types of lexical bundles were scrutinized under two main categories as 
clausal and phrasal. According to the results, native English scholars utilized noun-phrase based and 
prepositional-based structures, namely phrasal structures rather than clausal or verb-phrase based structures 
(40.6%, 50% and 6.3% respectively, see Example 1, 2 and 3 respectively). On the other hand, LI Turkish 
scholars used clausal or verb-phrase structures rather than noun-phrase based and prepositional-based structures, 
namely phrasal structures (33.3%, 31.3% and 24.2% respectively, see Example 4, 5 and 6 respectively). 

Example 1: “... teacher educators may reconceptualize the nature of the conversations ...” (Language Teaching, 
Article 15) 

Example 2: “The scale measures for each of the components of democratic citizenship were combined through 
the ICCS dataset using IRT Rasch modelling.” (Social Studies Education, Article 1) 

Example 3: “It is important to stress that there is no direct evidence of this...” (Special Education, Article 6) 

Example 4: “In certain studies it was found that school principals' knowledge about the mission of counselling 
services are limited..(Elementary and Middle School Education, Article 29) 

Example 5: “According to the results of the study, the diversities teachers mentioned the most are branch and 
political view diversities.” (Interdisciplinary Education Studies, Article 10) 

Example 6: “At the end of the study, it was observed that students’ achievements and self-regulation perceptions 
increased sharply.” (Mathematics Education, Article 7) 


Table 2. Distribution of structural subcategories 



Structural subeategories 

LI English 

L2 English 

NP-based 

Noun phrase with of-phrase fragment 

10 

20 

Noun-phrase with other post-modifier fragment 

3 

11 

PP-based 

Prepositional phrase with embedded of-phrase fragment 

13 

15 

Other prepositional phrase (fragment) 

3 

9 


Copula be + noun phrase/adjective phrase 

- 

2 


Anticipatory it + verb phrase/adjective phrase 

1 

14 


(Verb phrase +) that-clause fragment 

- 

6 

VP-based 

(Verb/adjective +) to-clause fragment 

1 

3 


Passive verb + prepositional phrase fragment 

- 

5 


Adverbial clause fragment 

- 

2 


Pronoun/noun phrase + be (+...) 

- 

1 


Other expressions 

1 

10 


These results corroborate with the other studies (Adel & Erman, 2012; Biber et al., 1999; Biber, Conrad, & 
Cortes, 2004; Pan, Reppen, & Biber, 2016) which suggest that native speakers primarily use phrasal bundles in 
academic prose. However, surprisingly, the Turkish writers of L2 English were found to overuse the clausal or 
verb phrase-based structures. This was similar to the findings of Pan, Reppen and Biber (2016) who found the 
overuse of clausal structures in the L2 research articles of Chinese scholars. This overuse might be related to the 
inefficiency of non-native speakers to use noun phrase and prepositional phrase structures. For instance, the verb 
phrase-based bundle in Example 7 (emphasis added) can be written in a shorter and native-like way as in 
Example 8 to make the sentence more efficient. 

Example 7: “It was found that there was a negative significant relationship among ...” (Mathematics 
Education, Article 7) 

Example 8: A negative significant relationship was found among... (Suggestion) 

Halliday (1989) also discusses that the written language should be more concise and the ratio of the lexical items 
to the total of running words should be higher in the written language. In the abovementioned example, the total 
number of the words and the number of function words can be reduced to a great extent as in the example. The 
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other reasons might be the translation (Halliday, 1989) and lack of writing proficiency (Pan, Reppen, & Biber, 
2016) as the learners are expected to shift from clausal structures to phrasal structures when they become more 
proficient at academic writing. Therefore, the overuse of clausal or verb-phrase based lexical bundles might be 
interpreted as the sign of lack of expertise in academic writing. 

3.2 Comparison of Functional Types of Lexical Bundles 

The lexical bundles (32 in LI and 98 in L2) were subjected to an analysis regarding their functions, and one of 
the 98 bundles in L2 was not categorized under any category. As can be seen in Table 3, the native English 
scholars frequently preferred to use research-oriented bundles which “help writers to structure their activities and 
experiences of the real world” (Hyland, 2008b, p. 49). These research-oriented bundles were used for description 
(n=7, Example 9), grouping (n= 3, Example 10), location (n= 4, Example 11), procedure (n=l, Example 12) and 
quantification (n=\ , Example 13). On the contrary to the expectations and some previous studies (Hyland, 2008b; 
Adel & Erman, 2012), native English scholars used fewer participant-oriented bundles in the current study. The 
only participant-oriented bundle (it is important to. Example 3) was a stance bundle that was used to “convey the 
writer’s attitudes and evaluations” (Hyland, 2008b, p. 49). Some other researchers (e.g. Chen and Baker, 2010; 
Salazar, 2010) also found that native writers used more research-oriented bundles and less participant-oriented 
bundles. For instance, Salazar (2010) found that 51.3% of the bundles were research-oriented, 42.4% were 
text-oriented, and 6.3% were participant-oriented bundles which is quite similar to the current study. The 
referential, discourse and stance bundles consist of 60%, 21% and 19% respectively in Chen and Baker’s (2010) 
study. Although the referential bundles, which correspond to research-oriented bundles in Hyland’s taxonomy, 
are confirmed to be the most dominant functional category in academic prose in many studies (e.g. Biber, 2009; 
Biber & Barbieri, 2007; Chen and Baker, 2010; Jukneviciene, 2009; Salazar, 2010), this category was followed 
by stance bundles, which correspond to participant-oriented bundles in Hyland’s taxonomy in the other studies 
(e.g. Biber, 2009; Biber & Barbieri, 2007; Jukneviciene, 2009) as a different finding. Contrarily, the text-oriented 
bundles were the most dominant functional category in some studies (e.g. Pan, Reppen & Biber, 2016). As an 
important aspect, the text-oriented bundles organize and deliver the arguments in research articles. The 
text-oriented bundles in the LI subcorpora were used to establish additive links (n= 2, Example 14), to mark 
cause and effect relations (n= 1, Example 15) and to situate arguments by specifying limiting conditions (n= 6, 
Example 16). 


Table 3. Distribution of functional subcategories 



LI English 

L2 English 

Research-oriented bundles 

22 (68.8%) 

30 (30.9%) 

Text-oriented bundles 

9 (28%) 

64 (66%) 

Participant-oriented bundles 

1 (3.1%) 

3 (3.1%) 


Example 9: “There are points of connection and convergence in the analysis of the drawings and the ways in 
which the children articulate their visual representations of temporality to demonstrate deep and philosophical 
insights.” (Arts Education, Article 2, emphasis added) 

Example 10: “As part of the game activities of both conditions, mentors held four reflection meetings...” 
(Instructional Technologies, Article 2, emphasis added) 

Example 11: “None of the pre-service teachers worked in long day care at the time of the study.” (Pre-school 
Education, Article 3, emphasis added) 

Example 12: “...these strategies have the potential to assist in the development of a mature learning 
community...” (Science Education, Article 3, emphasis added) 

Example 13: “Online systems also come in a variety of flavours based upon cost (purchased vs. no-cost)...” 
(Instructional Technologies, Article 4, emphasis added) 

Example 14: “The variations in ethos of subject departments as well as the whole school are therefore likely to 
impact...” (Interdisciplinary Education Studies, Article 1, emphasis added) 

Example 15: “...effectiveness within subjects may also differ as a result of the desired learning outcome...” 
(Physical Education, Article 2, emphasis added) 

Example 16: “A representative selection of responses are presented under these sub-headings and scrutinized in 
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relation to the literature review categories and themes.” (Music Education, Article 1, emphasis added). 

L1 Turkish writers of L2 English relied on text-oriented bundles more and participant-oriented bundles less. The 
overuse of text-oriented bundles by L2 English scholars can be seen important as it is the “most discursively 
crafted” functional category (Hyland, 2012, p. 15). In the L2 subcorpora, the text-oriented bundles were used to 
establish additive links (n=4, Example 17), to mark cause and effect relationships (n=8, Example 18), to compare 
and contrast elements (n= 9, Example 19), to situate arguments (n=ll. Example 20), to signal accepted facts (n= 3, 
Example 21), to signal inferences («=19, Example 22), to introduce aims (n= 4, Example 23) and to organize the 
discourse (n= 6, Example 24). The most dominant functional category was text-oriented bundles or discourse 
organizers in some studies (e.g. Chen & Baker, 2010; Pan, Reppen, & Biber, 2016) while there are some studies 
(e.g. Adel & Erman, 2012; Biber, 2009; Biber & Barbieri, 2007; Jukneviciene, 2009) in which the majority of 
the bundles were research-oriented bundles in university registers, namely academic prose. The research-oriented 
bundles in L2 subcorpora were used to indicate quality, degree and existence (n= 8, Example 25), to indicate 
events, actions and methods (n= 13, Example 26), to indicate quantities (n= 5, Example 27), to indicate place (n= 3, 
Example 28) and to indicate groups (n= 1, Example 29). 

Example 17: “On the other hand, social studies and science teachers tried to improve their students’ 
comprehension.” (Elementary and Middle School Education, Article 18, emphasis added) 

Example 18: “The aim of the study is... to determine the effect of the differentiation approach on creative 
thinking skills of gifted students...” (Interdisciplinary Education Studies, Article 15, emphasis added) 

Example 19: “As a result related to the abilities of interns there was a significant difference between the 
perspectives towards...” (Physical Education, Article 2, emphasis added) 

Example 20: “...have examined inclusion practices in Turkey in terms of the attitudes and the opinions of the 
teachers ...” (Pre-school Education, Article 6, emphasis added) 

Example 21: “...it was found that there is a strong correlation among perception of self-efficacy, intrinsic 
motivation and extrinsic motivation.” (Science Education, Article 6, emphasis added) 

Example 22: “It can be said that the categories in the present study share similarities with some categories...” 
(Social Studies Education, Article 5, emphasis added) 

Example 23: “...the Pearson Product-Moment Correlation coefficient was calculated in order to determine the 
relationships ...” (Special Education, Article 3, emphasis added) 

Example 24: “The opinions of teachers about advantages and disadvantages of diversities are presented in 
Table 3...” (Interdisciplinary Education Studies, Article 10, emphasis added) 

Example 25: “The data obtained from the questionnaire used in the present study conducted to determine...” 
(Language Education, Article 11, emphasis added) 

Example 26: “The purpose of this study was to investigate Turkish high school students’ attitude and anxiety 
levels...” (Mathematics Education, Article 11, emphasis added) 

Example 27: “...attitude is one of the most important indicators of students' affective characteristics...” (Music 
Education, Article 1, emphasis added) 

Example 28: “...the physical education teachers studied the following at the beginning of the academic year...” 
(Physical Education, Article 4, emphasis added) 

Example 29: “...considers this perception as one of the myths about the nature of science.” (Science Education, 
Article 2) 

In both groups (LI and L2), the scholars used very few participant-oriented lexical bundles in their research 
articles. The participant-oriented bundles in L2 subcorpora were it is important to, it is necessary to and it is 
possible to, and they were used to convey the writer’s attitudes and evaluations (n= 3) similar to native corpus. In 
the context of Chinese LI and Swedish LI of L2 writers, the participant-oriented bundles were the least 
frequently used functional categoiy as well; therefore, the studies of Chen and Baker (2010) and Adel and Erman 
(2012) corroborate with the results of the current study. However, there can be found some other studies (e.g. 
Biber, 2009; Biber & Barbieri, 2007; Jukneviciene, 2009) in which the participant-oriented bundles were used 
more frequently than the text-oriented or discourse organizing bundles. Therefore, these studies conducted in 
different LI settings might be a sign of crosslinguistic influence. 

Thus far, the four-word lexical bundles retrieved in the current study were analyzed structurally and functionally. 
In the following section, these bundles were scrutinized under the categories of shared and distinct lexical 
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bundles. The shared lexical bundles represented the ones which were shared by native English and non-native 
English speakers of Turkish. The bundles that were used by only native English speakers were presented under 
the category of distinct lexical bundles. 

3.3 Shared Lexical Bundles 

The number of the four-word lexical bundles which were shared by native and non-native scholars was 13. Two 
frequently overlapping bundles ( the beginning of the, the end of the) was removed not to inflate the numbers. 
Although there were many n-grams which were shared by native English and non-native scholars, the thirteen of 
these structures were considered to be as bundles according the cut-off criteria in the current study. The bundles 
which were presented in Table 4 were subjected to the structural and functional analysis. In terms of their 
structures, they have different structural characteristics. For instance, most of the shared lexical bundles were 
prepositional phrases such as prepositional phrase with embedded of-phrase fragment (as a result of in terms of 
the, at the beginning of, at the end of, in the form of) and other prepositional phrases (on the other hand, at the 
same time). Some noun structures were found such as noun phrase + of-phrase fragment (the purpose of this), 
and the verb phrases were as follows: (verb or adjective) + to-clause fragment (to be able to) and anticipatory it 
+ verb phrase/adjective phrase (it is important to). Only one expression was categorized under the title of other 
expressions (as well as the). 


Table 4. Shared lexical bundles 


Lexical Bundles 

The frequency in L2 English corpus 

The frequency in LI English corpus 

as a result of 

112 

36 

as well as the 

38 

26 

at the beginning of 

29 

21 

at the end of 

78 

66 

at the same time 

43 

22 

in terms of the 

55 

36 

in the form of 

21 

25 

it is important to 

20 

45 

on the other hand 

156 

27 

the purpose of this 

46 

23 

to be able to 

23 

23 


The statistics also showed that the shared bundles fall under research-oriented (/?=5), text-oriented (n=5), and 
participant-oriented (n= 1) functional categories. The five research-oriented lexical bundles were used to function 
as description (to be able to), location (at the end of, at the same time, at the beginning of), and procedure (the 
purpose of this) bundles. The text-oriented bundles were as follows: additive (on the other hand, as well as the), 
causative (as a result of and framing signals (in terms of the, in the form of). The only participant-oriented 
bundle is a stance bundle (it is important to). Similar to the current study, Pan, Reppen and Biber (2016) found 
that the largest functional category was text-oriented bundles in their native and non-native corpora. Flyland 
(2008a; 2008b) also found that the two thirds of the bundles were text-oriented bundles due to the fact that 
scholars in soft sciences persuade readers in a more interpretative and less empiricist way (Flyland, 2004). 

As can be understood from Table 2, the frequency of occurrence was different in LI English and L2 English 
especially for some bundles such as on the other hand and as a result of This can be interpreted that L2 English 
scholars use some of the lexical bundles in their articles more than their LI English counterparts do (Pan, 
Reppen, & Biber, 2016) due to the fact that writers memorize the lexical bundles and make use of these 
sequences in their writing exercises (Ellis, 2008). In other words, L2 learners might overuse the bundles which 
they are exposed to or they learn (Li & Scmitt, 2009). Some researchers (e.g. Chen & Baker, 2010; Hyland, 
2008a; Paquot, 2013, 2014; Salazar, 2010) revealed the overuse of some lexical bundles in different settings. In 
Turkish setting, Oztiirk (2014) also emphasized the repetitive use of the bundles Turkish LI writers in advanced 
academic writing. 

3.4 Distinct Lexical Bundles 

There were 104 four-word lexical bundles which were not shared by native and non-native scholars. The 
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nineteen of these bundles were only used by the native English scholars, and 85 of them were just used by the 
Turkish scholars. To analyze the distinct lexical bundles scrupulously, the overlapping bundles and context- and 
content-dependent bundles were removed from the analysis list. First, the bundles which were only used by the 
native English speakers were subjected to the structural and functional analysis. The most of the distinct lexical 
bundles which were used by English L1 scholars were prepositional phrases, especially the ones with embedded 
of-phrase fragment (in the context of, as part of the, at the time of, in the development of, in a variety of, as pari 
of a, for each of the, within the context of) and other prepositional phrases (in relation to the). This category was 
followed by noun phrases with of-phrase fragment (the role of the, the context of the, the development of the, the 
nature of the, the development of a, the importance of the, the impact of the) and noun phrases with other 
post-modifier fragment (the extent to which, the ways in which, the way in which). Of these bundles, six bundles 
(at the time of, in a variety of, as part of a, the context of the, the development of a, the impact of the) were never 
used by the Turkish scholars. Although Turkish scholars used more («=98 vs n= 32) lexical bundles in their 
research articles, they underused the ones which were frequently used by the native English scholars. For 
instance, %36.5 («=31) of the bundles that were distinctive to Turkish scholars was clausal or verb-phrase 
structures, and the percentage of the noun-phrase bundles was 32.9 («=28). On the contrary to the dominance of 
the prepositional phrases in the writing of English LI scholars, Turkish scholars underused the prepositional 
phrase lexical bundles (%20, n=\l) in their research articles. 


Table 5. The distinct bundles used by native English scholars 


Lexical Bundles 

The frequency in LI English corpus 

The frequency in L2 English corpus 

in the context of 

53 

12 

as part of the 

31 

0 

at the time of 

24 

0 

in the development of 

24 

11 

in a variety of 

24 

0 

as part of a 

23 

0 

for each of the 

23 

6 

within the context of 

24 

8 

in relation to the 

37 

13 

the role of the 

22 

7 

the context of the 

25 

0 

the development of the 

22 

11 

the nature of the 

22 

11 

the development of a 

20 

0 

the importance of the 

20 

13 

the impact of the 

20 

0 

the extent to which 

54 

19 

the ways in which 

36 

5 

the way in which 

38 

14 


When the distinct lexical bundles which were only used by native English scholars were classified according to 
their functions, 15 research-oriented and 4 text-oriented bundles were extracted. The research-oriented bundles 
that were distinctive to native English scholars served to function as description (the nature of the, the 
importance of the, the impact of the, the extent to which, the ways in which and the way in which), procedure (in 
the development of, the role of the, the development of the and the development of a), grouping (gw part of the, as 
part of a and for each of the), location (at the time of), and quantification (in a variety of). All of the text-oriented 
bundles in the study functioned as framing signals (i/7 the context of, within the context of, in relation to the and 
the context of the) to situate the arguments of the scholars. On the other hand, the distinctive bundles used by LI 
Turkish scholars were mainly comprised of text-oriented bundles (n= 59), and this category was followed by 
research-oriented bundles («=23). Turkish scholars also used two participant-oriented stance bundles (it is 
necessary’ to and it is possible to) in their research articles. Perez-Llantada (2014) also found similar (e.g. it is 
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necessary’ to) divergent participant-oriented stance bundles, and revealed that these bundles were used to attest 
the claims proposed in the previous sentences. 

4. Conclusion and Implications 

Lexical bundle studies have recently been on the top of the agenda of corpus studies, but the related literature has 
represented specific genres, such as learner essays, prospectus and so on, or disciplines, such as history, 
chemistry and engineering. In this regard, no study has scrutinized lexical bundles in the research articles that are 
written in the educational sciences. Therefore, the current study compared the lexical bundles in the research 
articles of native English scholars and non-native scholars in educational sciences through the structural and 
functional taxonomies, and revealed that the usages of lexical bundles by the non-native speakers of English 
deviated from the native speaker norms. The results of this comparison corroborated with some other studies 
(Bal, 2010; Hyland, 2008b; Oztiirk, 2004; Perez-Llantada, 2014; Pan, Reppen, & Biber, 2016) on the finding that 
non-native speakers used a broad range of lexical bundles. In other words, the English research articles of 
Turkish scholars consisted of a larger number of and more varied four-word lexical bundles than the English 
research articles of native English scholars did. 

The comparison of functional and structural categories of lexical bundles was another concern in the current 
study. Turkish scholars were observed to overuse clausal or verb-phrase based lexical bundles in their research 
articles. This results that are congruent with the results of Pan, Reppen, and Biber (2016) might be related to the 
inefficiency of Turkish scholars to use noun phrase and prepositional phrase structures since the number of 
clausal or verb-phrase based structures can be reduced to a great extent as suggested by Halliday (1989). The 
other reasons might be translation (Halliday, 1989) and lack of writing proficiency (Pan, Reppen, & Biber, 2016). 
Based on these resuls, writing instructors might focus on the reduction strategies in their writing classes for a 
shift from clausal or verb-phrase based structures to phrasal structures so that the students can improve their 
writing in a native-like manner and present their arguments succinctly. In terms of functional categorization, 
Turkish writers of L2 English relied heavily on text-oriented bundles. Although the use of lexical bundles in the 
“most discursively crafted” functional category can be regarded as desirable, the results should be approached 
cautiously. First, this might be related to “the more discursive and evaluative patterns of argument in the soft 
knowledge fields” (Hyland, 2008a), and secondly early career researchers might heavily rely on structuring 
signals to maintain cohesion and coherence (Bunton, 1999). Thirdly, the overuse of the text-oriented bundles 
might be a sign for the lack of syntactic and lexical knowledge of non-native speakers (Hinkel, 2001). Therefore, 
these overuses should be analyzed thoroughly and qualitatively in further studies. 

Another issue is the fact that the results of the current study should be read with a consideration on the potential 
transfer from native languages of the scholars. Many researchers pointed to the potential crosslinguistic influence 
on lexical bundles, but only a few attempted to scrutinize it due to the methodological hiatus on the comparison 
of languages. For instance, Allen (2010) revealed that Japanese academic writers overused some English lexical 
bundles which have LI equivalent in Japanese. French language learners were observed to use translational 
equivalents of the most frequent bundles in L2 writing in Paquot’s study (2013, 2014). Perez-Llantada (2014) 
also analyzed the translational equivalents of lexical bundles in English, and found a translational equivalent 
for %17 of the total bundles. Therefore, the current study also underlines the need for a study on crosslinguistic 
influence of lexical bundles within the Turkish context as suggested by the other Turkish researchers (Bal, 2010; 
Oztiirk, 2014). However, longer lexical bundles in English can be expressed with even one-word expressions due 
to agglutinative morphology of Turkish (Durrant, 2013). Hence, the lexical bundles in Turkish should be 
analyzed from a different perspective to see if there is a crosslinguistic influence. 

Considering the results of the current study, some qualitative studies can also be carried out to analyze the usages 
of lexical bundles with a small and specialized corpus since corpus-based studies are likely to contribute to the 
development of writers and the design of academic writing courses. Also, the use of clausal bundles by 
non-native scholars redundantly seem to show their lack of mastery in English academic writing (Cortes, 2004, 
2008; Durrant & Mathews-Aydinh, 2011; Li & Schmitt, 2009; Romer, 2009), and, as a solution, explicit or 
implicit corpus-informed instruction (Unaldi, Bayrakci, Akpinar, & Dolas, 2013) might help scholars to shift 
from clausal structures to phrasal structures. 
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